10 research outputs found

    NetShaper: A Differentially Private Network Side-Channel Mitigation System

    Full text link
    The widespread adoption of encryption in network protocols has significantly improved the overall security of many Internet applications. However, these protocols cannot prevent network side-channel leaks -- leaks of sensitive information through the sizes and timing of network packets. We present NetShaper, a system that mitigates such leaks based on the principle of traffic shaping. NetShaper's traffic shaping provides differential privacy guarantees while adapting to the prevailing workload and congestion condition, and allows configuring a tradeoff between privacy guarantees, bandwidth and latency overheads. Furthermore, NetShaper provides a modular and portable tunnel endpoint design that can support diverse applications. We present a middlebox-based implementation of NetShaper and demonstrate its applicability in a video streaming and a web service application

    Packing Privacy Budget Efficiently

    Full text link
    Machine learning (ML) models can leak information about users, and differential privacy (DP) provides a rigorous way to bound that leakage under a given budget. This DP budget can be regarded as a new type of compute resource in workloads of multiple ML models training on user data. Once it is used, the DP budget is forever consumed. Therefore, it is crucial to allocate it most efficiently to train as many models as possible. This paper presents the scheduler for privacy that optimizes for efficiency. We formulate privacy scheduling as a new type of multidimensional knapsack problem, called privacy knapsack, which maximizes DP budget efficiency. We show that privacy knapsack is NP-hard, hence practical algorithms are necessarily approximate. We develop an approximation algorithm for privacy knapsack, DPK, and evaluate it on microbenchmarks and on a new, synthetic private-ML workload we developed from the Alibaba ML cluster trace. We show that DPK: (1) often approaches the efficiency-optimal schedule, (2) consistently schedules more tasks compared to a state-of-the-art privacy scheduling algorithm that focused on fairness (1.3-1.7x in Alibaba, 1.0-2.6x in microbenchmarks), but (3) sacrifices some level of fairness for efficiency. Therefore, using DPK, DP ML operators should be able to train more models on the same amount of user data while offering the same privacy guarantee to their users

    Web Transparency for Complex Targeting: Algorithms, Limits, and Tradeoffs

    Get PDF
    International audienceBig Data promises important societal progress but exacerbates the need for due process and accountability. Companies and institutions can now discriminate between users at an individual level using collected data or past behavior. Worse, today they can do so in near perfect opacity. The nascent field of web transparency aims to develop the tools and methods necessary to reveal how information is used, however today it lacks robust tools that let users and investigators identify targeting using multiple inputs. Here, we formalize for the first time the problem of detecting and identifying targeting on combinations of inputs and provide the first algorithm that is asymptotically exact. This algorithm is designed to serve as a theoretical foundational block to build future scalable and robust web transparency tools. It offers three key properties. First, our algorithm is service agnostic and applies to a variety of settings under a broad set of assumptions. Second, our algorithm's analysis delineates a theoretical detection limit that characterizes which forms of targeting can be distinguished from noise and which cannot. Third, our algorithm establishes fundamental tradeoffs that lead the way to new metrics for the science of web transparency. Understanding the tradeoff between effective targeting and targeting concealment lets us determine under which conditions predatory targeting can be made unprofitable by transparency tools

    Vers une plus grande transparence du Web

    Get PDF
    International audienceDe plus en plus les géants du Web (Amazon, Google et Twitter en tête) recourent a la manne des « Big data » : ils collectent une myriade de données qu'ils exploitent pour leurs algorithmes de recommandation personnalisée et leurs campagnes publicitaires. Pareilles méthodes peuvent considérablement améliorer les services rendus a leurs utilisateurs, mais leur opacité fait débat. En effet, il n'existe pas a ce jour d'outil suffisamment robuste qui puisse tracer sur le Web l'usage des données et des informations sur un utilisateur par des services en ligne. Motivés par ce manque de transparence, nous avons développé un prototype du nom d'XRay, et qui peut prédire quelle donnée parmi toutes celles présentes dans un compte utilisateur est responsable de la réception d'une publicité. Dans cet article, nous présentons son principe ainsi que les résultats de nos premières expérimentations. Nous introduisons dans le même temps le tout premier modèle théorique pour le problème de la transparence du Web, et nous interprétons les performances d'Xray a la lumière de nos résultats obtenus dans ce modèle. En particulier, nous démontrons qu'un nombre θ(log N) de comptes utilisateurs auxiliaires, remplis selon un procédé aléatoire , suffisent a déterminer quelle donnée parmi les N en présence a causé la réception d'une publicité. Nous aborderons brièvement les extensions possibles, et quelques problèmes ouverts

    Boost: Effective Caching in Differentially-Private Databases

    Full text link
    Differentially private (DP) databases can enable privacy-preserving analytics over datasets or data streams containing sensitive personal records. In such systems, user privacy is a very limited resource that is consumed by every new query, and hence must be aggressively conserved. We propose Boost, the most effective caching component for linear query workloads over DP databases. Boost builds upon private multiplicative weights (PMW), a DP mechanism that is powerful in theory but very ineffective in practice, and transforms it into a highly effective caching object, PMW-Bypass, which uses prior-query results obtained through an external DP mechanism to train a PMW to answer arbitrary future linear queries accurately and "for free" from a privacy perspective. We show that Boost with PMW-Bypass conserves significantly more budget compared to vanilla PMW and simpler cache designs: at least 1.51 - 14.25x improvement in experiments on public Covid19 and CitiBike datasets. Moreover, Boost incorporates support for range-query workloads, such as timeseries or streaming workloads, where opportunities exist to further conserve privacy budget through DP parallel composition and warm-starting of PMW state. Our work thus establishes both a coherent system design and the theoretical underpinnings for effective caching in DP databases
    corecore